Brazilian Journal of Microbiology (2009) 40:411-416 
ISSN 1517-8382 



STRUCTURAL AND FUNCTIONAL ANALYSIS OF GIANT STRONG COMPONENT OF 
BACILLUS THURINGIENSIS METABOLIC NETWORK 

Ding, D.W. 1 2 ; Ding, Y.R. 1 *; Li, L.N. 3 ; Cai, Y.J. 4 ; Xu, W.B. 1 * 

'School of Information Technology, Jiangnan University, Wuxi 214036, China; department of Mathematics and Computer Science, 
Chizhou College, Chizhou 247000, China; 3 Department of Environmental Science, East China Normal University, Shanghai 200062, 
China; 4 School of biotechnology Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi 214036, China 

Submitted: March 31, 2008: Returned to authors for corrections: June 23, 2008; Approved: March 31, 2009. 



ABSTRACT 

The purpose of this work was to study the giant strong component (GSC) of B. thuringiensis metabolic 
network by structural and functional analysis. Based on so-called "bow tie" structure, we extracted and 
studied GSC with its functional significance. Global structural properties such as degree distribution and 
average path length were computed and indicated that the GSC is also a small-world and scale-free network. 
Furthermore, the GSC was decomposed and functional significant for metabolism of these divisions were 
investigated by comparing to KEGG metabolic pathways. 
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INTRODUCTION 

Advancements in the emerging systems biology in recent 
years fuelled the expectation that we could understand cellular 
behaviors by discovering how function arises in the interactions 
of cellular components (19). Thanks to the high-throughput 
(HT) technologies, which allow us to list all of these cellular 
components for an organism on the genome scale, and thus 
more and more biochemical networks are reconstructed, such 
as metabolic networks (8-10), transcriptional regulatory 
networks (15) and signaling networks (28). 

However, due to combinatorial explosion of pathways, it is 
difficult or even impossible to apply traditional pathway analysis 
methods (33-34) to these reconstructed networks. Help forward 
the way is provided by the rapidly developing complex networks, 
in which graph representation is widely used (1,3,5,16,23,26). 
For instance, the metabolic network could be represented by 
so-called metabolite graph in which the nodes are metabolites 
and the links are reactions. Then, the fundamental organizational 
principles that underlie networks could be discovered based 
global topological structural properties such as so-called "small- 
world" (36), "scale-free" (2) etc. Furthermore, to discover 



functional units involved in metabolic networks, it is suggested 
that metabolic networks should have modularity (29,3 1 ,32) which 
is similar to other complex networks, such as social networks, 
Internet, Worldwide Web etc. 

In this article, the use of metabolic reaction data to generate 
a metabolic network with 830 nodes and 1132 links of an 
important insecticidal bacterium B. thuringiensis (11,22) is 
achieved firstly. Subsequently, structural analysis of B. 
thuringiensis metabolic networks is explained and discussed 
based "bow tie" structure which is proposed by Ma and Zeng 
(24), emphasis is placed on the giant strong component (GSC) 
part. At last, the functional significance, global structural 
properties and modularity of GSC of B. thuringiensis metabolic 
networks are studied. 

MATERIALS AND METHODS 

Data Acquisition and Representation 

To investigate the topological properties of metabolism of 
B. thuringiensis, we first obtained all metabolic reactions 
involved in metabolic network of B. thuringiensis from KEGG 
database (17), and use number of each metabolite correspond 
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to compounds in the KEGG LIGAND database. For instance, 
metabolite 246 corresponds to compound C00246 (butanoate) 
in the KEGG database. Subsequently, all of the reactions are 
revised based a KEGG-based database developed by Ma and 
Zeng (25): 1) corrected obvious inconsistencies; 2) confirmed 
the reversibility of every reaction; 3) excluded the current 
metabolites and small molecules such as ATP, ADP, NADH and 
H 2 0 etc, with the purpose of reflecting biologically meaningful 
transformations. At last, the metabolic network reconstructed 
is represented by so-called metabolite graph in which the nodes 
are metabolites and the links are reactions. For example, the 
irreversible reaction, 64 + 26 -* 25 is represented by two directed 
arcs 64 -*■ 25 and 26 -» 25. 

Bow Tie Structure 

Since Ma and Zeng proposed (24) the "bow tie" structure of 
metabolic networks, it is increasingly recognized as being a 
conserved property of complex networks, as highlighted by 
recent studies (6,20,21,37), and the results suggest that this 
structure property is functional meaningful for metabolism, 
disease and the design principle of biological robustness. 

Generally speaking, a network with the "bow tie" structure 
could be decomposed into four parts: 1 ) giant strong component 
(GSC), 2) substrate subset (S), 3) product subset (P), and 4) 
isolated subset (IS) (24). The GSC is the biggest strongly 
connected components of a metabolic network. 

Degree Distribution and Average Path Length 

The direct reflection of difference among numerous 
metabolites in metabolic networks is the connection degree k, 
which is the links that the node has to others, and the degree 
distribution P(k) gives the probability of a node with degree k. 
One of the most important properties of metabolic networks is 
the power law degree distribution, i.e. P(k) ~ k ' (r=2.2), which 
means that most of the nodes in the network have a low degree, 
while a few nodes have a very high degree (16, 35). In other 
words, metabolic network is a sort of typical scale-free network 
(2). It is suggested that average path length of metabolic 
networks is very small (16,25), shown itself the property of "small- 
world". Another structure parameter is network diameter, which 
is defined as the path length of the longest pathway among all 
of the shortest pathways (4). 

Modularity and Simulated Annealing Algorithm 

An important properties related to detection of modules is 
modularity. For a presumptive partition of the nodes of a network 
into modules, the modularity M of this partition is defined as 
following (14,27): 



where r is the number of modules, Is is the number of links 
between nodes in modules, ds is the sum of the degrees of the 
nodes in module s, and L is the total number of links in the 
network. It is suggested that maximization of the modularity 
function would yield the most accurate results for random 
networks and widely used for identification of modules (12,13). 

Simulated annealing (18) is a stochastic optimization 
technique that could find 'low cost' configuration without 
getting trapped in 'high cost' local minima. As mentioned above, 
the method based on simulated annealing tries to find the 
optimal partitions of modules by maximizing the network 
modularity (12,13), and thus the cost is C= _ M herein, where 
M is the modularity defined in equation ( 1 ) . At each temperature 
T, some random updates are performed and accepted with 
probability: 



if C 2 <C, 

1 exp( — — — ) if C 2 > C] 
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where C2 and CI are respectively the cost after the update and 
before the update, while T is computational temperature. 
Specifically, at each temperature r there would be ni = fS2 nodes 
individual movements from one module to another, and nc =fS 
nodes collective movements, where S is the number of nodes in 
the network, and/with the recommended range of 0.1 to 1. At 
each certain temperature T, the system would be cooled down 
to7"=cT. 

RESULTS AND DISCUSSION 

Bow Tie Structure and Extraction of GSC 

The metabolic network of B. thuringiensis is reconstructed 
based the methods which is introduced in section 2.1. The 
network contains 830 nodes and 1132 links, and the global 
topology structure is shown in Fig. 1 . It is clearly that the whole 
network is far from strong component and included many 
isolated reactions. Then the whole metabolic network of B. 
thuringiensis is decomposed into four parts based the "bow 
tie" structure (Table 1). It should be noted that most nodes in S, 
P and IS part are connected by some single link which are not 
interested herein, while the metabolites and reactions involved 
in the GSC part is clearly much less than the whole network, and 
would be used to reduce the complexity of applying other 
pathway analysis methods such as extreme pathways and 
elementary modes (33,34). Furthermore, the GSC is the biggest 
strongly connected components of a metabolic network and 
determined structure of the entire network at a certain extent 
(24,37), thus it would be more detailed analysis herein. 

All of the 268 metabolic reactions are compared to KEGG 
pathways, and show that they are mainly concentrated on 
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Table 2. Reactions in GSC of B. thuringiensis metabolic network. 



Figure 1. Metabolic network topology structure of B. 
thuringiensis, the nodes correspond to metabolites and the 
lines correspond to reactions. The picture was drawn using the 
Pajek program. 



Table 1. The bow tie structure of B. thuringiensis metabolic 
network. GSC (giant strong component), S (substrate subset), 
P (product subset) and IS (isolated subset). 



Subsets 



GSC 



IS Total 



No. of metabolites 
Percentage of 
metabolites 
No. of reactions 
Percentage of 
reactions 



118 73 190 449 830 

14.2% 8.8% 22.9% 54.1% 100% 

268 82 252 530 1132 

23.7% 7.2% 22.3% 46.8% 100% 



carbohydrate metabolism and amino acid metabolism (Table 2). 
The reactions of carbohydrate metabolism accurately 
correspond to glycolysis, TCA cycle, pentose phosphate 
pathway, and partly correspond to pyruvate metabolism and 
butanoate metabolism. From the point of view of network 
topological, the results show that metabolites in carbohydrate 
metabolism (in particular glycolysis, TCA cycle and pentose 
phosphate pathway, i.e. the central metabolism) have the higher 
probability of much more links and stronger robustness in 
network, and thus might have higher attack tolerance despite 
external cues, genetic variation and stochastic noise. While 
reactions of amino acid metabolism are mainly concentrated on 
urea cycle and metabolism of amino groups, arginine and proline 
metabolism, and glycerophospholipid metabolism, these might 
reveal the nutrient requirement in B. thuringiensis. 

Degree Distribution and Average Path Length 

We first checked the scale-free property of the GSC of B. 
thuringiensis metabolic network (Fig. 2). As it known, the nodes 
with high degree of scale-free network would dominate the 
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reactions 


of reactions 


Carbohydrate Metabolism 


140 


52.2% 


Amino Acid Metabolism 


84 


31.3% 


Energy Metabolism 


24 


9.0% 


Lipid Metabolism 


8 


3.0% 


Others 


12 


4.5% 


Total 


268 


100% 



network structure, and make the network robust against random 
errors such as mutation and environmental changes. Ma and 
Zeng have identified 20 primary metabolites with the highest 
degree for 80 fully sequenced organisms and suggested these 
metabolites are almost the same (25). The result is partly 
reaffirmed in this study (Table 3), 6 of 10 hub metabolites of the 
GSC of B. thuringiensis metabolic network are present in their 
list (PYR, GLU, AcCoA, ICIT, ASP and SUC), while the remained 
4 metabolites are not. Among these 4 metabolites, BuCoAis the 
key metabolite linking butanoate metabolism and fatty acid 
metabolism, and it suggested that it is a key role related to 
novel pathway about PHB metabolism (7). 2HPP is the metabolite 
linking glycolysis pathway, pentose phosphate pathway and 
carbon fixation, E4P is the metabolite linking pentose phosphate 
pathway, aminosugars metabolism and carbon fixation, and GlyP 
play a key role among glycolysis pathway, fructose and mannose 
metabolism, glycerophospholipid metabolism, carbon fixation, 
nicotinate and nicotinamide metabolism. As links among different 
functional metabolic payhways, these hub metabolites 
(especially those 4 which are differ from Ma's universal hub 



Table 3. The first 10 hub metabolites of the GSC of B. 

thuringiensis metabolic network. 



Degree 


Metabolite name 


Abbreviation 


16 


Pyruvate 


PYR 


16 


(2R)-2-Hydroxy-3- 






(phosphonooxy)-propanal 


2HPP 


14 


Glycerone phosphate 


GlyP 


13 


L-Glutamate 


GLU 


12 


Acetyl-CoA 


AcCoA 


12 


Isocitrate 


ICIT 


10 


D-Erythrose 4-phosphate 


E4P 


9 


L-Aspartate 


ASP 


9 


Butanoyl-CoA 


BuCoA 


8 


Succinate 


SUC 
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metabolites) with their corresponding reactions play a key role 
in metabolic regulation and may helpful to reveal the biological 
significance about B. thuringiensis metabolism. The average 
path length is 8.63 and network diameter is 24 for the GSC of B. 




thuringiensis metabolic network, which is similar to other multi- 
bacteria via Pathway Hunter Tool (PHT) (30) and Ma and Zeng 
(25) (Table 4). 

Modules of GSC 

Various of decomposed results of the giant strong 
component of B. thuringiensis metabolic network based on 
simulated annealing algorithm are obtained due to different 
iteration factor (f) and cooling factor (c) as mentioned in section 
2, at last we chosen the best decomposed result (Table 5, Fig. 3) 
after a number of computing. The result gives clearly partition 
with the number of metabolites, total links, within-module links 
and between-module links in each module and the modularity 



Table 4. Average path length (AL) and diameter (D) of multi- 
bacteria. 



Figure 2. Log-log plot of the degree distributions for the GSC 
of B. thuringiensis metabolic network. 



Organisms 


Abbreviation 


AL 


D 


Bacillus subtilis 


bsu 


8.48 


23 


Escherichia coli 


eco 


8.16 


23 


Haemophilus influenzae 


hin 


8.35 


27 


Helicobacter pylori 


hpj 


7.91 


24 


Salmonella typhimurium 


stm 


8.22 


24 




Figure 3. Modules in the GSC of B. thuringiensis metabolic network, the picture was drawn using the Pajek program. 
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in the partition of the network is 0.752 1 83 . Then the decomposed 
result is also reaffirmed by compared to KEGG metabolic 
pathways, i.e. most modules are mainly corresponding to one 
or two KEGG pathways (Table 6). For instance, 1 1 of 12 within 
links in module 4 are corresponding to Glycerophospholipid 
metabolism and 11 of 15 within links in module 1 are 
corresponding to butanoate metabolism would demonstrate the 
anterior one. As for the latter one, 10 of 24 and 8 of 24 within 
links in module 7 are corresponding to arginine and proline 
metabolism, urea cycle and metabolism of amino groups, 
respectively. 



Table 5. Decomposed results of the GSC of B. thuringiensis 
metabolic network. 



Module 


Nodes 


Total 


Within 


Between 






links 


links 


links 


1 


14 


16 


15 


1 


2 


9 


16 


10 


6 


3 


20 


32 


28 


4 


4 


10 


14 


12 


2 


5 


8 


11 


7 


4 


6 


17 


27 


19 


8 


7 


18 


26 


24 


2 


8 


9 


13 


8 


5 


9 


13 


22 


16 


6 


Modularity 


0.752183 









Table 6. The decomposed results of the GSC of B. thuringiensis 
metabolic network is reaffirmed by compared to KEGG metabolic 
pathways. 



Module 


Pathways in KEGG 


1 


butanoate metabolism 


2 


pyruvate metabolism 


3 


glycolysis, pentose phosphate pathway, 




carbon fixation 


4 
5 


glycerophospholipid metabolism 


6 


pyruvate metabolism, several amino 




acid biosynthesis 


7 


arginine and proline metabolism, urea 


8 


cycle and metabolism of amino groups 


9 


TCA cycle 



represents that the corresponding module includes several 

pathways and it is difficult to assign it one or two simple pathways. 



CONCLUSION 

With the explosion of knowledge in 'X-mics' and systems 
biology, more and more genome-scale metabolic networks being 
reconstructed (8-10). To discover functional information 
involved in metabolic networks, a number of topological 
structural based approaches have already been developed, and 
it suggested that these computational modeling and analysis 
could contribute a lot to the understanding of the structure and 
function of these systems (1,3,5,16,23,26). 

Taken together, this study provides an attempt at exploring 
the fundamental organizational principles that underlie B. 
thuringiensis metabolic network. We have initiated the study 
by integrating data from KEGG and correlative database, then 
the metabolic network reconstructed is represented by metabolite 
graph. Considering many isolated reactions are included in the 
whole metabolic networks, we extracted the most important part 
giant strong component (GSC) and analyzed its global structural 
properties and biological implication. We validated the "small- 
world" and "scale-free" characters and analyzed the first 10 hub 
metabolites of the GSC accordingly. Finally, the functional 
modules in GSC were studied with their biological significance. 
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RESUMO 

Analise estrutural de funcional do GSC (Giant 
Strong Component) da rede metabolica de 

Bacillus thurigiensis 

O objetivo deste trabalho foi realizar uma analise estrutural 
e funcional do GSC (Giant Strong Component) da rede 
metabolica de Bacillus thurigiensis. Baseando-se na estrutura 
bow-tie, o GSC foi extrafdo e analisado quanto ao sue significado 
funcional. Propriedades estruturais globais tais como grau de 
distribuicao e tamanho medio da via metabolica foram 
mensuradas, concluindo-se que o GSC e tambem uma rede small 
world e scale-free. Alem disso, a rede GSC foi decomposta e as 
divisoes com significancia funcional no metabolismo foram 
comparadas as vias metabolicas KEGG. 

Palavras-chave: Bacillus thurigiensis, Giant Strong 
Component, rede metabolica 
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